Secure Statistical Analysis of Distributed Databases

نویسندگان

  • Alan F. Karr
  • Xiaodong Lin
  • Ashish P. Sanil
  • Jerome P. Reiter
چکیده

A continuing need in the contexts of homeland security, national defense and counterterrorism is for statistical analyses that “integrate” data stored in multiple, distributed databases. There is some belief, for example, that integration of data from flight schools, airlines, credit card issuers, immigration records and other sources might have prevented the terrorist attacks of September 11, 2001, or might be able to prevent recurrences. In addition to significant technical obstacles, not the least of which is poor data quality [32, 31], proposals for large-scale integration of multiple databases have engendered significant public opposition. Indeed, the outcry has been so strong that some plans have been modified or even abandoned. The political opposition to “mining” distributed databases centers on deep, if not entirely precise, concerns about the privacy of database subjects and, to a lesser extent, database owners. The latter is an issue, for example, for databases of credit card transactions or airline ticket purchases. Integrating the data without protecting ownership could be problematic for all parties: the companies would be revealing who their customers are, and where a person is a customer would also be revealed. For many analyses, however, it is not necessary actually to integrate the data. Instead, as we show in this paper, using techniques from computer science known generically as secure multi-party computation, the database holders can share analysis-specific sufficient statistics anonymously, but in a way that the desired analysis can be performed in a principled manner. If the sole concern is protecting the source rather than the content of data elements, it is even possible to share the data themselves, in which case any analysis can be performed. The same need arises in non-security settings as well, especially scientific and policy investigations. For example, a regression analysis on integrated state databases about factors influencing student performance

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

"Secure" Log-Linear and Logistic Regression Analysis of Distributed Databases

The machine learning community has focused on confidentiality problems associated with statistical analyses that “integrate” data stored in multiple, distributed databases where there are barriers to simply integrating the databases. This paper discusses various techniques which can be used to perform statistical analysis for categorical data, especially in the form of log-linear analysis and l...

متن کامل

Secure Regression on Distributed Databases

This article presents several methods for performing linear regression on the union of distributed databases that preserve, to varying degrees, confidentiality of those databases. Such methods can be used by federal or state statistical agencies to share information from their individual databases, or to make such information available to others. Secure data integration, which provides the lowe...

متن کامل

Secure Statistical Analysis of Distributed Databases, Emphasizing What We Don't Know

Over the past several years, the National Institute of Statistical Sciences (NISS) has developed methodology to perform statistical analyses that, in effect, integrate data in multiple, distributed databases, but without literally bringing the data together in one place. In this paper, we summarize that research, but focus on issues that are not understood. These include inability to perform ex...

متن کامل

Secure, Privacy-Preserving Analysis of Distributed Databases

There is clear value, in both industrial and government settings, derived from performing statistical analyses that, in effect, integrate data in multiple, distributed databases. However, the barriers to actually integrating the data can be substantial or even insurmountable. Corporations may be unwilling to share proprietary databases such as chemical databases held by pharmaceutical manufactu...

متن کامل

Regression on Distributed Databases via Secure Multi-Party Computation

We present a method for performing linear regression on the union of distributed databases that does not entail constructing an integrated database, and therefore preserves confidentiality of the individual databases. The method can be used by statistical agencies to share information from their individual databases, or to make such information available to others.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005